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DETAILED ACTION 

1. Applicant's amendments after Final office action, filed June 13, 2008 has been 
entered and made of record. 

2. Applicant's arguments, see (page 12-13), filed June 13, 2008, with respect to the 
rejection(s) of claim(s) 1 under Simard et al. in view of Viscito et al. and Serizawa et al. 
have been fully considered and are persuasive. Therefore, the rejection has been 
withdrawn. However, upon further consideration, a new ground(s) of rejection is made 
in view of Zhang "detection of Text captions in compressed domain video". 

Claim Rejections - 35 USC $ 103 

3. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. Claims 1-2, 4, 11-12, and 14 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Simard et al. (US 7,024,039) in view of Zhang et al. (detection of Text 
captions in compressed domain video, 2000, PP 201-204, Vol. 8) and Hirosawa et al. 
(US 6,720,965). 

(1) Regarding claims 1 and 11: 

Simard et al. disclose a system (device) and method (column 1, line 16-17) for 
facilitating image retouching, comprising: 

an input part (100 in Fig.1 ) for receiving an input image (column 6, line 17-20); 
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converting pixels in the character blocks into pixels having a first brightness value 
and pixels in the background blocks into pixels having a second brightness value 
(column 6, line 41-44) 

However, Simard et al. do not teach explicitly the block classification part for 
classifying the input image into character blocks and background blocks using block 
energy values and a block energy threshold, and a position search part for searching for 
left, right, top and bottom positions of a character region by horizontally and vertically 
scanning the block-classified image, and determining a position of the character region, 
and a region of contents (ROC) extraction part for extracting an image in the determined 
position of the character region from the input image, and ROC extension part for 
extending the detected image of the character region to a size of the input image 

(a) Obviousness in view of Zhang et al. 

Zhang et al. teach the block classification part for classifying the input image into 
character blocks and background blocks (page 202, left column, lines 2-5, and 
paragraph [2.2]) using block energy values and a block energy threshold (Fig. 3, page 
paragraph [2.2], lines 33-37) and a position search part for searching for left, right, top 
and bottom positions of a character region by horizontally and vertically scanning the 
block-classified image, and determining a position of the character region (page 202, 
left column, lines 4-15), and a region of contents (ROC) extraction part for extracting an 
image in the determined position of the character region from the input image (page 
202, right column, lines 44-46). 
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It is desirable to developing a binarized contrast feature domain which provides 
superior way to identify and extract text regions in video frame. The Zhang et al. 
teaching of the identification of text frame is to achieve this goal. Therefore, it would 
have been obvious to one having ordinary skill in the art at the time of the invention, to 
apply the Zhang et al. teaching to substitute Simard et al. elements 130, and 110 with 
the Zhang et al. teaching of classifying the input image, the determining of a position of 
the character region, and the extracting of an image, because such combination 
provides superior way to identify and extract text regions in video frame (page 201, left 
column, lines 28-32). 

(b) Obviousness in view ofHirosawa etal. 

Hirosawa et al. teach extending the detected image of the character region to a 
size of the input image (Figs. 27-30, and Fig. 36, col. 24, lines 57-58, and col. 30, lines 
22-31). 

It is desirable to display an enlarged image with good operability while keeping a 
large amount of information displayed on each output image. The Hirosawa et al. 
teaching of extending an input image is to achieve this goal. Therefore, it would have 
been obvious to one having ordinary skill in the art at the time of the invention, to apply 
the Hirosawa et al. teaching to substitute the element 120 from the combination Simard 
et al. and Zhang et al. with the Hirosawa et al. teaching of extending an image, because 
such combination displays an enlarged image with good operability while keeping a 
large amount of information displayed on each output image (col.1, lines 61-67). 
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(2) Regarding claims 2 and 12: 

Simard et al. teach the dividing of the input image into blocks having a 
predetermined size (column 3, line 26-27), and filling the character blocks with pixels 
converted to have the first brightness value and filling the background blocks with pixels 
converted to have the second brightness value(column 6, line 41-44). 

However, Simard et al. do not teach explicitly the DCT-converting the divided 
blocks output from the image division part; calculating a sum of absolute values of 
dominant DCT coefficients in each of the DCT converted blocks, and outputting the 
calculated sum as the energy value of a corresponding blocks; summing up the energy 
values calculated for the respective blocks, output from the energy calculation part, and 
generating the threshold value by dividing the summed energy value by the total 
number of the blocks; and receiving the block energy values output from the energy 
calculation part, and classifying the blocks into character blocks or background blocks 
by comparing the received block energy values with the threshold. 

Zhang et al., teach the DCT-converting the divided blocks output from the image 
division part (page 202, left column, lines 2-5); calculating a sum of absolute values of 
dominant DCT coefficients in each of the DCT converted blocks (see formulas (1) and 
(2), page 202, left column, lines 7-11), and outputting the calculated sum as the energy 
value of a corresponding blocks (page 202, left column, linesl 2-1 5); summing up the 
energy values calculated for the respective blocks, output from the energy calculation 
part (formulas (4), (5),(6), page 202, right column, lines 2-23), and generating the 
threshold value by dividing the summed energy value by the total number of the blocks 
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(page 202, right column, lines 33-37); and receiving the block energy values output from 
the energy calculation part (page 202, right column, lines 37-41), and classifying the 
blocks into character blocks or background blocks (page 202, left column, lines 2-5, and 
paragraph [2.2]) by comparing the received block energy values with the threshold 
value (page 202, right column, lines 33-37). 

It is desirable to developing a binarized contrast feature domain which provides 
superior way to identify and extract text regions in video frame. The Zhang et al. 
teaching of the DCT converting is to achieve this goal. Therefore, it would have been 
obvious to one having ordinary skill in the art at the time of the invention, to apply the 
Zhang et al., where using energy calculation, threshold calculation, and classification of 
the blocks, with the Simard et al. teaching, because such combination provides superior 
way to identify and extract text regions in video frame (page 201, left column, lines 28- 
32). 

(3) Regarding claims 4 and 14: 

The combination Simard et al. and Zhang et al. teach the parental claims 1 and 
11. However, the combination Simard et al. and Zhang et al. does not teach explicitly 
the aspect ration of the input image. 

Hirosawa et al. teach an aspect ration of an image (col. 29, lines 44-45). 

It is desirable to display an enlarged image with good operability while keeping a 
large amount of information displayed on each output image. The Hirosawa et al. 
teaching of an aspect ration of an image is to achieve this goal. Therefore, it would have 
been obvious to one having ordinary skill in the art at the time of the invention, to apply 



Application/Control Number: 10/765,071 Page 7 

Art Unit: 2624 

the Hirosawa et al. teaching of the ration image with the combination Simard et al. and 
Zhang et al., because such combination displays an enlarged image with good 
operability while keeping a large amount of information displayed on each output image 
(col. 1, lines 61-67). 

5. Claims 3 and 13 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Simard et al., Zhang et al. and Hirosawa et al., as applied to claims 2 and 12 above, 
and further in view of Viscito et al. (US 6,782, 135). 

The combination Simard et al., Zhang et al. and Hirosawa et al. teach the 
parental claims 1-2 and 11-12. Furthermore, Zhang et al. teaches the energy value of 
each block calculated by the equation disclosed in claims 3 and 13 (formula 2, page 
202, left column, lines 9). 

However, the combination Simard et al., Zhang et al. and Hirosawa et al. does 
not teach explicitly that the block has a size of 8x8 pixels. 

Viscito et al., in analogous environment, teaches a method and system where 
each block has 8 rows by 8 column pixel blocks (column 9, line 55). 

It is desirable to make the system possible for modeling the human visual 
system. The Viscito et al. teaching, where each block has 8 rows by 8 column pixel 
blocks is to achieve this goal. Therefore, it would have been obvious to one having 
ordinary skill in the art at the time of the invention, to apply the Viscito et al. teaching, 
where using 8 rows by 8 column pixel blocks, with the combination Simard et al., Zhang 
et al. and Hirosawa et al., because such combination enabling accurate and efficient 
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video quantization and make it possible for modeling the human visual system (column 
2, line 55-57). 

6. Claims 5 and 15 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Simard et al., Zhang et al. and Hirosawa et al., as applied to claims 1 and 11 above, 
and further in view of the admitted prior art (see pages 18 and 19 in the specification). 

The combination Simard et al., Zhang et al. and Hirosawa et al. teach the 
parental claims 1 and 11. However, the combination Simard et al., Zhang et al. and 
Hirosawa et al. do not teach explicitly the performing of bilinear interpolation in 
accordance with the formula disclosed in claims 5 and 15. 

The admitted prior art discloses the bilinear interpolation method and operation 
as well as the formula of claims 5 and 1 5 (equation (4), page 1 8, line 28). 

It is desirable to extend the output image to the input image without affecting the 
quality of the image. The admitted prior art where using the bilinear interpolation is to 
achieve this goal. Therefore, it would have been obvious to one having ordinary skill in 
the art at the time if the invention to apply admitted prior art teaching of the bilinear 
interpolation with the combination Simard et al., Zhang et al. and Hirosawa et al., 
because such combination enlarging the output image to the input image without 
affecting the quality of the image. 

7. Claims 6-7 and 16-17 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Simard et al. (US 7,024,039) in view of Zhang et al. (detection of Text captions in 
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compressed domain video, 2000, PP 201-204, Vol. 8) and Hirosawa et al. (US 
6,720,965), and Otsuka (US 6,731,820). 

(1) Regarding claims 6 and 16: 

The rejection of claims 1 and 11 applies to claims 6 and 16. 

However, the combination Simard et al. Zhang et al., and Hirosawa does not 
teach explicitly the performing median filtering on the image output from the block 
classification part to remove blocks erroneously classified as character blocks. 

Otsuka, teaches using the median filter for performing median filtering on an 
image output (See the Abstract, line 1-2) from the block classification part to remove 
blocks erroneously classified as character blocks (Paragraph [0016], line 6-7), (the 
removing of noise in an image is read as the same concept as the removing the 
character blocks erroneously classified as character blocks) 

It is desirable to realizing a large-scale nonlinear filter as a digital circuit. The 
Otsuka teaching of the median filter is to achieve this goal. Therefore, it would have 
been obvious to one having ordinary skill in the art at the time of the invention, to apply 
the Otsuka teaching of the median filter with the combination Simard et al. Zhang et al., 
and Hirosawa, because such combination is realizing a large-scale nonlinear filter as a 
digital circuit (column 1, line 63-65). 

(2) Regarding claims 7 and 17: 

The combination Simard et al. Zhang et al., and Hirosawa disclose the parental 
claims 6 and 16. However, the combination Simard et al. Zhang et al., and Hirosawa 
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does not teach explicitly that the median filter determines isolated character blocks as 
erroneously classified character blocks. 

Otsuka, teaches using the median filter for performing median filtering on an 
image output (See the Abstract, line 1-2) where determining isolated character blocks 
as erroneously classified character blocks (Paragraph [0016], line 6-7), (the isolated 
character blocks are read as noise). 

It is desirable to realizing a large-scale nonlinear filter as a digital circuit. The 
Otsuka teaching of the median filter is to achieve this goal. Therefore, it would have 
been obvious to one having ordinary skill in the art at the time of the invention, to apply 
the Otsuka teaching of the median filter with the combination Simard et al. Zhang et al., 
and Hirosawa, because such combination is realizing a large-scale nonlinear filter as a 
digital circuit (column 1, line 63-65). 

8. Claims 8-9 and 18-19 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Simard et al. (US 7,024,039) in view of Zhang et al. (detection of Text captions in 
compressed domain video, 2000, PP 201-204, Vol. 8) and Hirosawa et al. (US 
6,720,965), and Otsuka (US 6,731,820), and Alderson et al. (US-PGPUB 
2002/0159648). 

(1) Regarding claims 8 and 18: 

The rejection of claims 1,6,11, and 1 6 applies to claims 8 and 18. 
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However, the combination Simard et al., Zhang et al., Hirosawa, and Otsuka 
does not teach explicitly the performing of mean filtering on the input image to blur the 
input image. 

Alderson et al., teach the performing of mean filtering on the input image to blur 
the input image (Fig. 7, step 710, paragraph [0070], lines 12-13). 

It is desirable to remove the gradient data. The Alderson approach where using a 
mean filter is to achieve this goal. Therefore, it would have been obvious to one having 
ordinary skill in the art at the time of the invention, to apply the Alderson teaching of the 
mean filter with the combination Simard et al., Zhang et al., Hirosawa, and Otsuka, 
because such combination removes the gradient data and it could be applied to imagery 
previously collected and stored in a memory for example (paragraph [0006]). 

(2) Regarding claims 9 and 19: 

The rejection of claims 1, 6, 8, 11, 16 and 18 applies to claims 9 and 19. 
Furthermore, Hirosawa et al. teach the extending of the median filtered image to the 
size of the input image (Figs. 27-30, and Fig. 36, col. 24, lines 57-58, and col. 30, lines 
22-31 ), (the extending of the median filtered image to the size of the input image is read 
as the same concept as the extending of the character region to the size of the input 
image), 

However, the combination Simard et al., Zhang et al., Hirosawa, and Otsuka, do 
not teach explicitly the subsampling of pixels in the image output from the block 
classification part to reduce the number of the pixels, and the interpolating of the 
median-filtered image. 
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Anderson et al. teach the subsampling of pixels in the image to reduce the 
number of the pixels (Fig. 4, paragraph [0043], lines 11-14), and the interpolating of the 
median-filtered image (paragraph [0071], lines 6-9). 

It is desirable to remove the gradient data, the Alderson approach where 
interpolating the image by using the bilinear interpolation is to achieve this goal. 
Therefore, it would have been obvious to one having ordinary skill in the art at the time 
of the invention, to apply the Alderson et al. teaching of the bilinear interpolation with the 
combination Simard et al., Zhang et al., Hirosawa, and Otsuka, because such 
combination removes the gradient data and it could be applied to imagery previously 
collected and stored in a memory for example (paragraph [0006]). 

9. Claims 10 and 20 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Simard et al., Zhang et al., Hirosawa et al., and Otsuka, and Alderson et al., as 
applied to claims 9 and 19 above, and further in view of Astle (US 5,684,544). 

The combination Simard et al., Zhang et al., Hirosawa et al., and Otsuka, and 
Alderson et al. teach the parental claim 9. However, the combination Simard et al., 
Zhang et al., Hirosawa et al., and Otsuka, and Alderson et al. do not teach explicitly the 
subsampling ratio (2:1). sup. 2. 

Astle teaches subsampling pixels using the ratio aspect (4:1) (column 5, line 50- 
51), (the ratio aspect (4:1) is read as the ratio (2:1).sup.2.). 

It is desirable to have improved computer-implemented processes for 
upsampling chrominance signals. The Astle approach where using the aspect ratio (4:1) 
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is to achieve this goal. Therefore, it would have been obvious to one having ordinary 
skill in the art at the time of the invention, to apply the Astle's teaching where using the 
aspect ratio (4:1) with the combination Simard et al., Zhang et al., Hirosawa et al., and 
Otsuka, and Alderson et al., because such combination, makes an improved computer- 
implemented processes for upsampling chrominance signals (column 2, line 29-31), so 
that the subsampled pixels can be encoded and transmitted with smaller code size 
witch will increase the filtering process in the median filter part. 

Contact information: 

10. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Amara Abdi whose telephone number is (571)270-1670. 
The examiner can normally be reached on Monday through Friday 8:00 Am to 4:00 PM 
E.T.. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Jingge Wu can be reached on (571) 272-7429. The fax phone number for 
the organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

/Jingge Wu/ 

Supervisory Patent Examiner, Art Unit 2624 



/Amara Abdi/ 
Examiner, Art Unit 2624 
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